1,142 research outputs found

    Histogram of Oriented Principal Components for Cross-View Action Recognition

    Full text link
    Existing techniques for 3D action recognition are sensitive to viewpoint variations because they extract features from depth images which are viewpoint dependent. In contrast, we directly process pointclouds for cross-view action recognition from unknown and unseen views. We propose the Histogram of Oriented Principal Components (HOPC) descriptor that is robust to noise, viewpoint, scale and action speed variations. At a 3D point, HOPC is computed by projecting the three scaled eigenvectors of the pointcloud within its local spatio-temporal support volume onto the vertices of a regular dodecahedron. HOPC is also used for the detection of Spatio-Temporal Keypoints (STK) in 3D pointcloud sequences so that view-invariant STK descriptors (or Local HOPC descriptors) at these key locations only are used for action recognition. We also propose a global descriptor computed from the normalized spatio-temporal distribution of STKs in 4-D, which we refer to as STK-D. We have evaluated the performance of our proposed descriptors against nine existing techniques on two cross-view and three single-view human action recognition datasets. The Experimental results show that our techniques provide significant improvement over state-of-the-art methods

    Action Classification with Locality-constrained Linear Coding

    Full text link
    We propose an action classification algorithm which uses Locality-constrained Linear Coding (LLC) to capture discriminative information of human body variations in each spatiotemporal subsequence of a video sequence. Our proposed method divides the input video into equally spaced overlapping spatiotemporal subsequences, each of which is decomposed into blocks and then cells. We use the Histogram of Oriented Gradient (HOG3D) feature to encode the information in each cell. We justify the use of LLC for encoding the block descriptor by demonstrating its superiority over Sparse Coding (SC). Our sequence descriptor is obtained via a logistic regression classifier with L2 regularization. We evaluate and compare our algorithm with ten state-of-the-art algorithms on five benchmark datasets. Experimental results show that, on average, our algorithm gives better accuracy than these ten algorithms.Comment: ICPR 201

    Theoretical Design and Analysis of Multivolume Digital Assays with Wide Dynamic Range Validated Experimentally with Microfluidic Digital PCR

    Get PDF
    This paper presents a protocol using theoretical methods and free software to design and analyze multivolume digital PCR (MV digital PCR) devices; the theory and software are also applicable to design and analysis of dilution series in digital PCR. MV digital PCR minimizes the total number of wells required for “digital” (single molecule) measurements while maintaining high dynamic range and high resolution. In some examples, multivolume designs with fewer than 200 total wells are predicted to provide dynamic range with 5-fold resolution similar to that of single-volume designs requiring 12 000 wells. Mathematical techniques were utilized and expanded to maximize the information obtained from each experiment and to quantify performance of devices and were experimentally validated using the SlipChip platform. MV digital PCR was demonstrated to perform reliably, and results from wells of different volumes agreed with one another. No artifacts due to different surface-to-volume ratios were observed, and single molecule amplification in volumes ranging from 1 to 125 nL was self-consistent. The device presented here was designed to meet the testing requirements for measuring clinically relevant levels of HIV viral load at the point-of-care (in plasma, 1 000 000 molecules/mL), and the predicted resolution and dynamic range was experimentally validated using a control sequence of DNA. This approach simplifies digital PCR experiments, saves space, and thus enables multiplexing using separate areas for each sample on one chip, and facilitates the development of new high-performance diagnostic tools for resource-limited applications. The theory and software presented here are general and are applicable to designing and analyzing other digital analytical platforms including digital immunoassays and digital bacterial analysis. It is not limited to SlipChip and could also be useful for the design of systems on platforms including valve-based and droplet-based platforms. In a separate publication by Shen et al. (J. Am. Chem. Soc., 2011, DOI: 10.1021/ja2060116), this approach is used to design and test digital RT-PCR devices for quantifying RNA

    #REVAL: a semantic evaluation framework for hashtag recommendation

    Full text link
    Automatic evaluation of hashtag recommendation models is a fundamental task in many online social network systems. In the traditional evaluation method, the recommended hashtags from an algorithm are firstly compared with the ground truth hashtags for exact correspondences. The number of exact matches is then used to calculate the hit rate, hit ratio, precision, recall, or F1-score. This way of evaluating hashtag similarities is inadequate as it ignores the semantic correlation between the recommended and ground truth hashtags. To tackle this problem, we propose a novel semantic evaluation framework for hashtag recommendation, called #REval. This framework includes an internal module referred to as BERTag, which automatically learns the hashtag embeddings. We investigate on how the #REval framework performs under different word embedding methods and different numbers of synonyms and hashtags in the recommendation using our proposed #REval-hit-ratio measure. Our experiments of the proposed framework on three large datasets show that #REval gave more meaningful hashtag synonyms for hashtag recommendation evaluation. Our analysis also highlights the sensitivity of the framework to the word embedding technique, with #REval based on BERTag more superior over #REval based on FastText and Word2Vec.Comment: 18 pages, 4 figure

    A Comparative Review of Recent Kinect-based Action Recognition Algorithms

    Full text link
    Video-based human action recognition is currently one of the most active research areas in computer vision. Various research studies indicate that the performance of action recognition is highly dependent on the type of features being extracted and how the actions are represented. Since the release of the Kinect camera, a large number of Kinect-based human action recognition techniques have been proposed in the literature. However, there still does not exist a thorough comparison of these Kinect-based techniques under the grouping of feature types, such as handcrafted versus deep learning features and depth-based versus skeleton-based features. In this paper, we analyze and compare ten recent Kinect-based algorithms for both cross-subject action recognition and cross-view action recognition using six benchmark datasets. In addition, we have implemented and improved some of these techniques and included their variants in the comparison. Our experiments show that the majority of methods perform better on cross-subject action recognition than cross-view action recognition, that skeleton-based features are more robust for cross-view recognition than depth-based features, and that deep learning features are suitable for large datasets.Comment: Accepted by the IEEE Transactions on Image Processin

    Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition with CNNs

    Full text link
    In this paper, we revive the use of old-fashioned handcrafted video representations for action recognition and put new life into these techniques via a CNN-based hallucination step. Despite of the use of RGB and optical flow frames, the I3D model (amongst others) thrives on combining its output with the Improved Dense Trajectory (IDT) and extracted with its low-level video descriptors encoded via Bag-of-Words (BoW) and Fisher Vectors (FV). Such a fusion of CNNs and handcrafted representations is time-consuming due to pre-processing, descriptor extraction, encoding and tuning parameters. Thus, we propose an end-to-end trainable network with streams which learn the IDT-based BoW/FV representations at the training stage and are simple to integrate with the I3D model. Specifically, each stream takes I3D feature maps ahead of the last 1D conv. layer and learns to `translate' these maps to BoW/FV representations. Thus, our model can hallucinate and use such synthesized BoW/FV representations at the testing stage. We show that even features of the entire I3D optical flow stream can be hallucinated thus simplifying the pipeline. Our model saves 20-55h of computations and yields state-of-the-art results on four publicly available datasets.Comment: First two authors contributed equally. This paper is accepted by ICCV'1
    • …
    corecore